Variable Selection and Estimation with the Seamless-l0 Penalty
نویسندگان
چکیده
Penalized least squares procedures that directly penalize the number of variables in a regression model (L0 penalized least squares procedures) enjoy nice theoretical properties and are intuitively appealing. On the other hand, L0 penalized least squares methods also have significant drawbacks in that implementation is NP-hard and computationally unfeasible when the number of variables is even moderately large. One of the challenges is the discontinuity of the L0 penalty. We propose the seamless-L0 (SELO) penalty, a smooth function on [0,∞) that very closely resembles the L0 penalty. The SELO penalized least squares procedure is shown to consistently select the correct model and is asymptotically normal, provided the number of variables grows more slowly than the number of observations. SELO is efficiently implemented using a coordinate descent algorithm. Since tuning parameter selection is crucial to the performance of the SELO procedure, we propose a BIC-like tuning parameter selection method for SELO, and show that it consistently identifies the correct model while allowing the number of variables to diverge. Simulation results show that the SELO procedure with BIC tuning parameter selection performs well in a variety of settings – outperforming other popular penalized least squares procedures by a substantial margin. Using SELO, we analyze a publicly available HIV drug resistance and mutation dataset and obtain interpretable results.
منابع مشابه
Variable selection and estimation in generalized linear models with the seamless L0 penalty.
In this paper, we propose variable selection and estimation in generalized linear models using the seamless L0 (SELO) penalized likelihood approach. The SELO penalty is a smooth function that very closely resembles the discontinuous L0 penalty. We develop an e cient algorithm to fit the model, and show that the SELO-GLM procedure has the oracle property in the presence of a diverging number of ...
متن کاملThe Florida State University College of Arts and Sciences Theories on Group Variable Selection in Multivariate Regression Models
We study group variable selection on multivariate regression model. Group variable selection is selecting the non-zero rows of coefficient matrix, since there are multiple response variables and thus if one predictor is irrelevant to estimation then the corresponding row must be zero. In a high dimensional setup, shrinkage estimation methods are applicable and guarantee smaller MSE than OLS acc...
متن کاملVariable Selection via A Combination of the L0 and L1 Penalties
Variable selection is an important aspect of high-dimensional statistical modelling, particularly in regression and classification. In the regularization framework, various penalty functions are used to perform variable selection by putting relatively large penalties on small coefficients. The L1 penalty is a popular choice because of its convexity, but it produces biased estimates for the larg...
متن کاملRejoinder: One-step Sparse Estimates in Nonconcave Penalized Likelihood Models By
Most traditional variable selection criteria, such as the AIC and the BIC, are (or are asymptotically equivalent to) the penalized likelihood with the L0 penalty, namely, pλ(|β|) = 2λI (|β| = 0), and with appropriate values of λ (Fan and Li [7]). In general, the optimization of the L0-penalized likelihood function via exhaustive search over all subset models is an NP-hard computational problem....
متن کامل0 Sparse Inverse Covariance Estimation
Recently, there has been focus on penalized loglikelihood covariance estimation for sparse inverse covariance (precision) matrices. The penalty is responsible for inducing sparsity, and a very common choice is the convex l1 norm. However, the best estimator performance is not always achieved with this penalty. The most natural sparsity promoting “norm” is the non-convex l0 penalty but its lack ...
متن کامل